Determining accurate bird's eye view (BEV) positions of objects and tracks in a scene is vital for various perception tasks including object interactions mapping, scenario extraction etc., however, the level of supervision required to accomplish that is extremely challenging to procure. We propose a light-weight, weakly supervised method to estimate 3D position of objects by jointly learning to regress the 2D object detections and scene's depth prediction in a single feed-forward pass of a network. Our proposed method extends a center-point based single-shot object detector \cite{zhou2019objects}, and introduces a novel object representation where each object is modeled as a BEV point spatio-temporally, without the need of any 3D or BEV annotations for training and LiDAR data at query time. The approach leverages readily available 2D object supervision along with LiDAR point clouds (used only during training) to jointly train a single network, that learns to predict 2D object detection alongside the whole scene's depth, to spatio-temporally model object tracks as points in BEV. The proposed method is computationally over $\sim$10x efficient compared to recent SOTA approaches [1, 38] while achieving comparable accuracies on KITTI tracking benchmark.
translated by 谷歌翻译
Autonomous driving has a natural bi-level structure. The goal of the upper behavioural layer is to provide appropriate lane change, speeding up, and braking decisions to optimize a given driving task. However, this layer can only indirectly influence the driving efficiency through the lower-level trajectory planner, which takes in the behavioural inputs to produce motion commands. Existing sampling-based approaches do not fully exploit the strong coupling between the behavioural and planning layer. On the other hand, end-to-end Reinforcement Learning (RL) can learn a behavioural layer while incorporating feedback from the lower-level planner. However, purely data-driven approaches often fail in safety metrics in unseen environments. This paper presents a novel alternative; a parameterized bi-level optimization that jointly computes the optimal behavioural decisions and the resulting downstream trajectory. Our approach runs in real-time using a custom GPU-accelerated batch optimizer, and a Conditional Variational Autoencoder learnt warm-start strategy. Extensive simulations show that our approach outperforms state-of-the-art model predictive control and RL approaches in terms of collision rate while being competitive in driving efficiency.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
许多测量机器人和动态障碍状态的商品传感器具有非高斯噪声特征。然而,许多当前的方法将运动和感知的潜在不确定性视为高斯,主要是为了确保计算障碍。另一方面,与非高斯不确定性一起工作的现有计划者不会阐明运动和感知噪声的分布特征,例如偏见以避免有效碰撞。本文通过将避免反应性碰撞解释为碰撞约束违规与Dirac Delta分布之间的分配匹配问题来填补这一空白。为了确保策划者的快速反应性,我们将每个分布嵌入重现Hilbert空间,并将分布匹配重新匹配,以最大程度地减少两个分布之间的最大平均差异(MMD)。我们表明,评估给定对照输入的MMD归结为仅矩阵矩阵产品。我们利用这种见解来开发一种简单的控制抽样方法,以避免动态和不确定的障碍。我们在两个方面推进了最新的。首先,我们进行了广泛的实证研究,以表明我们的计划者可以从样本级别的信息中推断出分布偏差。因此,它使用此见解来指导机器人良好的同型。我们还强调了基本不确定性的高斯近似如何失去偏置估计值,并引导机器人以高碰撞概率为不利状态。其次,我们显示了与以前的非参数和高斯近似反应性碰撞避免碰撞的碰撞方法的拟议分布匹配方法的切实比较优势。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
我们认为具有非正度运动学的代理/机器人的问题避免了许多动态障碍。机器人和障碍物的状态和速度噪声以及机器人的控制噪声被建模为非参数分布,因为噪声模型的高斯假设被侵犯在现实世界中。在这些假设下,我们制定了一种强大的MPC,其以使机器人对准目标状态的方式有效地样本机器人控制,同时避免这种非参数噪声的胁迫下的障碍物。特别地,MPC包括分布匹配成本,其有效地将当前碰撞锥的分布对准到某个所需的分布,其样本是无碰撞的。这种成本在希尔伯特空间中作为距离功能构成,其最小化通常导致碰撞锥样品变得无碰撞。我们通过线性化原始非参数状态和障碍物分布的高斯近似来对比较和显示有形性能增益。我们还通过对非参数噪声的高斯近似构成的方法来表现出卓越的性能,而不会对进一步的线性提出进行这种近似的非参数噪声的高斯近似。性能增益在轨迹长度和控制成本方面都显示,其遵守所提出的方法的功效。据我们所知,这是在存在非参数状态,速度和致动器噪声模型存在下的运动障碍的第一次呈现。
translated by 谷歌翻译
本文提出了一种模型预测控制(MPC)静态跟踪静态和动态障碍物的算法。我们的主要贡献在于提高了潜在的非凸轨道优化的计算途径和可靠性。结果是MPC算法,在笔记本电脑和嵌入式硬件设备(如Jetson TX2)上运行实时运行。我们的方法依赖于在由此产生的轨迹优化中引起多凸结构的跟踪,碰撞和遮挡约束的新颖重新装配。我们利用拆分Bregman迭代技术利用这些数学结构,最终将我们的MPC减少到几毫秒内可解决的一系列凸二次程序。即使考虑到目标轨迹和动态障碍物的简单恒定速度预测,我们的快速重新计划允许在复杂环境中遮挡和无碰撞跟踪。我们在现实物理发动机中进行广泛的台面标记,并表明我们的MPC在可视性,平滑度和计算时度量中表现出最先进的算法。
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The core of the computer business now offers subscription-based on-demand services with the help of cloud computing. We may now share resources among multiple users by using virtualization, which creates a virtual instance of a computer system running in an abstracted hardware layer. It provides infinite computing capabilities through its massive cloud datacenters, in contrast to early distributed computing models, and has been incredibly popular in recent years because to its continually growing infrastructure, user base, and hosted data volume. This article suggests a conceptual framework for a workload management paradigm in cloud settings that is both safe and performance-efficient. A resource management unit is used in this paradigm for energy and performing virtual machine allocation with efficiency, assuring the safe execution of users' applications, and protecting against data breaches brought on by unauthorised virtual machine access real-time. A secure virtual machine management unit controls the resource management unit and is created to produce data on unlawful access or intercommunication. Additionally, a workload analyzer unit works simultaneously to estimate resource consumption data to help the resource management unit be more effective during virtual machine allocation. The suggested model functions differently to effectively serve the same objective, including data encryption and decryption prior to transfer, usage of trust access mechanism to prevent unauthorised access to virtual machines, which creates extra computational cost overhead.
translated by 谷歌翻译